Case Study 01: How Does a Bike-Share Navigate Speedy Success?¶

Author: Vinícius Alves

Date: 03/04/23

Version: 1.1

About the Company:¶

Cyclistic is a bike-share company that are working in Chicago. It has more than 5800 bikes that are geotracked and locked in a network of 692 station across the city.

The company has some flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships.

The Analysis¶

Ask phase:¶

In this case study we will analyze the data of users of a bike rental company. We want to know the difference between those users that have the subscription and those who use the service without commitment. After analyzig the data, ideas of how can we increase the number of annual memberships are desired.

The data that will be analyzed is from the past 12 months (march-2023).

Prepare phase:¶

The data used¶

The data used on this analysis is the past 12 months of users data from Cyclistic . The data has no personal information about the riders.

How is the data distributed?¶

The data is stored in a AWS bucket and already available in Wide format.

Bias¶

After analyzing the data, we can we can say that the data is ROCCC (concept used by Google):

  • Reliable
    • Because the data is not biased, it includes every types of users.
  • Original
    • The data was collected from the company (first-party data).
  • Comprehensive
    • All the titles are well written.
  • Current
    • The data was updated montly.
  • Cited
    • As others case study have already been made, I think we can agree with it.

About data integrity¶

As the company collected the data, it is hard to say that it was modified. And about the location that is stored, AWS, we can assume that is secure about any undesired changes in the data.

problems with the data?¶

There are some blank cells in the csv file. There are one archive that the name is not in the same pattern than the others. This will need some threatment.

In [ ]:
# Libraries
import pandas as pd
import numpy as np
import statistics as st

import plotly.express as px
In [6]:
# Load and Concatenate data

archives = ['202202-divvy-tripdata.csv',
            '202203-divvy-tripdata.csv',
            '202204-divvy-tripdata.csv',
            '202205-divvy-tripdata.csv',
            '202206-divvy-tripdata.csv',
            '202207-divvy-tripdata.csv',
            '202208-divvy-tripdata.csv', 
            '202209-divvy-publictripdata.csv',
            '202210-divvy-tripdata.csv',
            '202211-divvy-tripdata.csv',
            '202212-divvy-tripdata.csv',
            '202301-divvy-tripdata.csv',
            '202301-divvy-tripdata.csv']

data_year = pd.DataFrame()

for archive in archives:

  archive_df = pd.read_csv(f"Data_Cyclistic/{archive}", skipinitialspace=True, index_col=0)  
  data_year = pd.concat([data_year, archive_df], axis=0)

data_year
Out[6]:
rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual
ride_id
E1E065E7ED285C02 classic_bike 2022-02-19 18:08:41 2022-02-19 18:23:56 State St & Randolph St TA1305000029 Clark St & Lincoln Ave 13179 41.884621 -87.627834 41.915689 -87.634600 member
1602DCDC5B30FFE3 classic_bike 2022-02-20 17:41:30 2022-02-20 17:45:56 Halsted St & Wrightwood Ave TA1309000061 Southport Ave & Wrightwood Ave TA1307000113 41.929143 -87.649077 41.928773 -87.663913 member
BE7DD2AF4B55C4AF classic_bike 2022-02-25 18:55:56 2022-02-25 19:09:34 State St & Randolph St TA1305000029 Canal St & Adams St 13011 41.884621 -87.627834 41.879255 -87.639904 member
A1789BDF844412BE classic_bike 2022-02-14 11:57:03 2022-02-14 12:04:00 Southport Ave & Waveland Ave 13235 Broadway & Sheridan Rd 13323 41.948150 -87.663940 41.952833 -87.649993 member
07DE78092C62F7B3 classic_bike 2022-02-16 05:36:06 2022-02-16 05:39:00 State St & Randolph St TA1305000029 Franklin St & Lake St TA1307000111 41.884621 -87.627834 41.885837 -87.635500 member
... ... ... ... ... ... ... ... ... ... ... ... ...
A303816F2E8A35A8 electric_bike 2023-01-11 17:46:23 2023-01-11 17:57:31 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902634 -87.631591 41.920771 -87.663712 casual
BCDBB142CC610382 classic_bike 2023-01-30 15:08:10 2023-01-30 15:33:26 Western Ave & Leland Ave TA1307000140 Clarendon Ave & Gordon Ter 13379 41.966400 -87.688704 41.957867 -87.649505 member
7D1C7CA80517183B classic_bike 2023-01-06 19:34:50 2023-01-06 19:50:01 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902973 -87.631280 41.920771 -87.663712 casual
1A4EB636346DF527 classic_bike 2023-01-13 18:59:24 2023-01-13 19:14:44 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902973 -87.631280 41.920771 -87.663712 casual
069971675AC7DC62 electric_bike 2023-01-02 13:48:29 2023-01-02 13:59:29 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902822 -87.631687 41.920771 -87.663712 casual

5944549 rows × 12 columns

In [7]:
print(f'The columns in the data are {list(data_year.columns)}')
print("Those that deserve some attention is the Start and End time, Start and end Station name/id and member_casual classification")
The columns in the data are ['rideable_type', 'started_at', 'ended_at', 'start_station_name', 'start_station_id', 'end_station_name', 'end_station_id', 'start_lat', 'start_lng', 'end_lat', 'end_lng', 'member_casual']
Those that deserve some attention is the Start and End time, Start and end Station name/id and member_casual classification

Data: https://divvy-tripdata.s3.amazonaws.com/index.html

Process phase:¶

As already seen, I am choosing to use a Jupyter Noteebok coded in python

Steps to clean the data:¶

  1. Concatenated the data;
  2. Check how many rows are with no data;
    • The number of rows are not so expressive to compromise the analysis. Without them, we still with 77% of the data;
  3. Check if there are some duplicates;
    • Discovered 148305 rows with the same ride_id;
  4. Check for inconsistent data;
    • Discovered 69 rows with inconsistent data;
  5. Saved the final DataFrame.

How many blank cells are in data?¶

In [8]:
data_year.isna().sum()
Out[8]:
rideable_type              0
started_at                 0
ended_at                   0
start_station_name    870246
start_station_id      870246
end_station_name      930495
end_station_id        930495
start_lat                  0
start_lng                  0
end_lat                 6026
end_lng                 6026
member_casual              0
dtype: int64

There is 241540 cells without information in start_station_name and start_station_id;

There is 264615 cells without information in end_station_name and end_station_id;

There is 1001 cells without information in end_lat and end_lng;



If I eliminate every cell without information, how many last?

In [9]:
new_data_year = data_year.dropna()
new_data_year
Out[9]:
rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual
ride_id
E1E065E7ED285C02 classic_bike 2022-02-19 18:08:41 2022-02-19 18:23:56 State St & Randolph St TA1305000029 Clark St & Lincoln Ave 13179 41.884621 -87.627834 41.915689 -87.634600 member
1602DCDC5B30FFE3 classic_bike 2022-02-20 17:41:30 2022-02-20 17:45:56 Halsted St & Wrightwood Ave TA1309000061 Southport Ave & Wrightwood Ave TA1307000113 41.929143 -87.649077 41.928773 -87.663913 member
BE7DD2AF4B55C4AF classic_bike 2022-02-25 18:55:56 2022-02-25 19:09:34 State St & Randolph St TA1305000029 Canal St & Adams St 13011 41.884621 -87.627834 41.879255 -87.639904 member
A1789BDF844412BE classic_bike 2022-02-14 11:57:03 2022-02-14 12:04:00 Southport Ave & Waveland Ave 13235 Broadway & Sheridan Rd 13323 41.948150 -87.663940 41.952833 -87.649993 member
07DE78092C62F7B3 classic_bike 2022-02-16 05:36:06 2022-02-16 05:39:00 State St & Randolph St TA1305000029 Franklin St & Lake St TA1307000111 41.884621 -87.627834 41.885837 -87.635500 member
... ... ... ... ... ... ... ... ... ... ... ... ...
A303816F2E8A35A8 electric_bike 2023-01-11 17:46:23 2023-01-11 17:57:31 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902634 -87.631591 41.920771 -87.663712 casual
BCDBB142CC610382 classic_bike 2023-01-30 15:08:10 2023-01-30 15:33:26 Western Ave & Leland Ave TA1307000140 Clarendon Ave & Gordon Ter 13379 41.966400 -87.688704 41.957867 -87.649505 member
7D1C7CA80517183B classic_bike 2023-01-06 19:34:50 2023-01-06 19:50:01 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902973 -87.631280 41.920771 -87.663712 casual
1A4EB636346DF527 classic_bike 2023-01-13 18:59:24 2023-01-13 19:14:44 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902973 -87.631280 41.920771 -87.663712 casual
069971675AC7DC62 electric_bike 2023-01-02 13:48:29 2023-01-02 13:59:29 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902822 -87.631687 41.920771 -87.663712 casual

4585800 rows × 12 columns

If the cells without information are deleted, 4585800 rows lasts from 5944549 , 77% of the data.



Are there duplicate lines in the data?

In [12]:
new_data_year.duplicated().sum()
Out[12]:
148305

Yes, there is 148305 lines that was duplicated.



Deleting Duplicates

In [13]:
new_data_year_no_duplicates = new_data_year.drop_duplicates(keep='first')

Checking if there are incosistents, like, start_at > ended_at

In [14]:
new_data_year_no_duplicates['started_at'] = pd.to_datetime(new_data_year_no_duplicates['started_at'])
new_data_year_no_duplicates['ended_at'] = pd.to_datetime(new_data_year_no_duplicates['ended_at'])

new_data_year_no_duplicates[new_data_year_no_duplicates['started_at']> new_data_year_no_duplicates['ended_at']]
C:\Users\Spuck\AppData\Local\Temp\ipykernel_12828\2262464060.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data_year_no_duplicates['started_at'] = pd.to_datetime(new_data_year_no_duplicates['started_at'])
C:\Users\Spuck\AppData\Local\Temp\ipykernel_12828\2262464060.py:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_data_year_no_duplicates['ended_at'] = pd.to_datetime(new_data_year_no_duplicates['ended_at'])
Out[14]:
rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual
ride_id
2D97E3C98E165D80 classic_bike 2022-03-05 11:00:57 2022-03-05 10:55:01 DuSable Lake Shore Dr & Wellington Ave TA1307000041 DuSable Lake Shore Dr & Wellington Ave TA1307000041 41.936688 -87.636829 41.936688 -87.636829 casual
7407049C5D89A13D electric_bike 2022-03-05 11:38:04 2022-03-05 11:37:57 Sheffield Ave & Wellington Ave TA1307000052 Sheffield Ave & Wellington Ave TA1307000052 41.936313 -87.652522 41.936253 -87.652662 casual
072E947E156D142D electric_bike 2022-06-07 19:14:46 2022-06-07 17:07:45 W Armitage Ave & N Sheffield Ave 20254.0 W Armitage Ave & N Sheffield Ave 20254.0 41.920000 -87.650000 41.920000 -87.650000 casual
BF114472ABA0289C electric_bike 2022-06-07 19:14:47 2022-06-07 17:05:42 Base - 2132 W Hubbard Hubbard Bike-checking (LBS-WH-TEST) W Armitage Ave & N Sheffield Ave 20254.0 41.917831 -87.653363 41.920000 -87.650000 member
029D853B5C38426E classic_bike 2022-07-26 20:07:33 2022-07-26 19:59:34 Lincoln Ave & Roscoe St* chargingstx5 Lincoln Ave & Roscoe St* chargingstx5 41.943350 -87.670668 41.943350 -87.670668 member
... ... ... ... ... ... ... ... ... ... ... ... ...
2D98008FFB28C1B8 electric_bike 2022-11-06 01:56:17 2022-11-06 01:12:19 Wabash Ave & Grand Ave TA1307000117 Wells St & Elm St KA1504000135 41.891129 -87.626821 41.903222 -87.634324 casual
112ED5B9200BFD2A classic_bike 2022-11-06 01:46:10 2022-11-06 01:06:44 Sheffield Ave & Webster Ave TA1309000033 Wells St & Institute Pl 22001 41.921540 -87.653818 41.897380 -87.634420 casual
417746CBEB92A34E classic_bike 2022-11-06 01:46:17 2022-11-06 01:05:13 Wells St & Hubbard St TA1307000151 Aberdeen St & Jackson Blvd 13157 41.889906 -87.634266 41.877726 -87.654787 member
B5602D5BB3D517F6 electric_bike 2022-11-06 01:59:05 2022-11-06 01:02:03 Western Ave & Winnebago Ave 13068 California Ave & Milwaukee Ave 13084 41.915592 -87.687070 41.922695 -87.697153 member
4139B11634039661 classic_bike 2022-11-06 01:58:46 2022-11-06 01:11:33 Clark St & Grace St TA1307000127 Broadway & Berwyn Ave 13109 41.950780 -87.659172 41.978353 -87.659753 member

69 rows × 12 columns

Checked that 69 rows have incosistent data

Deleting those inconsistencies

In [15]:
new_data_year_no_duplicates = new_data_year_no_duplicates[new_data_year_no_duplicates['started_at'] < new_data_year_no_duplicates['ended_at']]

Creating some metrics to help ahead

In [16]:
# ------------------------------------------------------------------------------------------


# ride length
new_data_year_no_duplicates['ride_length'] = new_data_year_no_duplicates['ended_at'] - new_data_year_no_duplicates['started_at']
new_data_year_no_duplicates['ride_length'] = new_data_year_no_duplicates['ride_length'].dt.total_seconds()


# days of the week
new_data_year_no_duplicates['day_of_week'] = new_data_year_no_duplicates['started_at'].dt.dayofweek + 1 # have to adapt, because here monday = 0


# Saving the data in csv
new_data_year_no_duplicates.to_csv("Data_Cyclistic/data_12months_no_duplicate.csv")

After the cleaning process, we have 4437183 rows with data.

Analyze phase:¶

I've decided to put every calculation together in a cell, so the output will be like a resume of what has been done. If you want to see the code used to that calculation, just go to the code.

In [17]:
# Loading the DataFrame
data_clean = pd.read_csv("Data_Cyclistic/data_12months_no_duplicate.csv")

data_clean
Out[17]:
ride_id rideable_type started_at ended_at start_station_name start_station_id end_station_name end_station_id start_lat start_lng end_lat end_lng member_casual ride_length day_of_week
0 E1E065E7ED285C02 classic_bike 2022-02-19 18:08:41 2022-02-19 18:23:56 State St & Randolph St TA1305000029 Clark St & Lincoln Ave 13179 41.884621 -87.627834 41.915689 -87.634600 member 915.0 6
1 1602DCDC5B30FFE3 classic_bike 2022-02-20 17:41:30 2022-02-20 17:45:56 Halsted St & Wrightwood Ave TA1309000061 Southport Ave & Wrightwood Ave TA1307000113 41.929143 -87.649077 41.928773 -87.663913 member 266.0 7
2 BE7DD2AF4B55C4AF classic_bike 2022-02-25 18:55:56 2022-02-25 19:09:34 State St & Randolph St TA1305000029 Canal St & Adams St 13011 41.884621 -87.627834 41.879255 -87.639904 member 818.0 5
3 A1789BDF844412BE classic_bike 2022-02-14 11:57:03 2022-02-14 12:04:00 Southport Ave & Waveland Ave 13235 Broadway & Sheridan Rd 13323 41.948150 -87.663940 41.952833 -87.649993 member 417.0 1
4 07DE78092C62F7B3 classic_bike 2022-02-16 05:36:06 2022-02-16 05:39:00 State St & Randolph St TA1305000029 Franklin St & Lake St TA1307000111 41.884621 -87.627834 41.885837 -87.635500 member 174.0 3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4437178 A303816F2E8A35A8 electric_bike 2023-01-11 17:46:23 2023-01-11 17:57:31 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902634 -87.631591 41.920771 -87.663712 casual 668.0 3
4437179 BCDBB142CC610382 classic_bike 2023-01-30 15:08:10 2023-01-30 15:33:26 Western Ave & Leland Ave TA1307000140 Clarendon Ave & Gordon Ter 13379 41.966400 -87.688704 41.957867 -87.649505 member 1516.0 1
4437180 7D1C7CA80517183B classic_bike 2023-01-06 19:34:50 2023-01-06 19:50:01 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902973 -87.631280 41.920771 -87.663712 casual 911.0 5
4437181 1A4EB636346DF527 classic_bike 2023-01-13 18:59:24 2023-01-13 19:14:44 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902973 -87.631280 41.920771 -87.663712 casual 920.0 5
4437182 069971675AC7DC62 electric_bike 2023-01-02 13:48:29 2023-01-02 13:59:29 Clark St & Elm St TA1307000039 Southport Ave & Clybourn Ave TA1309000030 41.902822 -87.631687 41.920771 -87.663712 casual 660.0 1

4437183 rows × 15 columns

Calculations¶

In [18]:
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']
In [19]:
# Calculations
days = ['sunday', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday']

# Mean ride_length
mean_ride_length = np.average(data_clean['ride_length'])
print(f"Mean ride length {mean_ride_length} seconds")
# 1018.0223639638032 seconds
print("------------------------------------------------------")

# Max ride_length
max_ride_length = np.max(data_clean['ride_length'])
print(f"Max ride length {max_ride_length} seconds")
# 2061244.0000000002 seconds

print("------------------------------------------------------")

# Most used day of week
most_used_day = st.mode(np.array(data_clean['day_of_week']))
print(f"The most used day was: {days[st.mode(np.array(data_clean['day_of_week']))-1]}")
# 6 -> Friday

print("------------------------------------------------------")

# Average ride_length of members
mean_ride_length_members = np.average(data_clean[data_clean['member_casual']=='member']['ride_length'])
print(f"Mean ride length of members: {mean_ride_length_members} seconds")
# 743.92 seconds

print("------------------------------------------------------")

# Average ride_length of non members (casual)
mean_ride_length_casuals = np.average(data_clean[data_clean['member_casual']=='casual']['ride_length'])
print(f"Mean ride length of casuals: {mean_ride_length_casuals} seconds")
# 1429.11 seconds

# Non members may use more time because they do not use as a way to quickly move around places, they may use to walk around.
print("------------------------------------------------------")

# Most used day of week by members
most_used_day_members = st.mode(np.array(data_clean[data_clean['member_casual']=='member']['day_of_week']))
print(f"Most used day of week by members: {days[most_used_day_members-1]}")
# 2 -> Monday
print("------------------------------------------------------")

# Most used day of week by casuals
most_used_day_casuals = st.mode(np.array(data_clean[data_clean['member_casual']=='casual']['day_of_week']))
print(f"Most used day of week by casuals: {days[most_used_day_casuals-1]}")
# 6 -> Friday
print("------------------------------------------------------")

# Average ride_length by day of the week
ride_length_by_day = []
for i in range(len(days)):
  ride_length_by_day.append(np.average(data_clean[data_clean['day_of_week'] == i + 1 ]['ride_length']))
  print(f"Average time on {days[i]}: {np.average(data_clean[data_clean['day_of_week'] == i + 1 ]['ride_length'])}")
# in seconds
# Average time on sunday: 990.4536910566923
# Average time on monday: 887.3830605997467
# Average time on tuesday: 879.2172592014024
# Average time on wednesday: 913.2799763426103
# Average time on thursday: 975.6575417238967
# Average time on friday: 1231.9800748637176
# Average time on saturday: 1228.884441941905

print("------------------------------------------------------")

# Average ride_length by day of the week but only for members
ride_length_by_day_members = []
for i in range(len(days)):
  ride_length_by_day_members.append(np.average(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='member') ]['ride_length']))
  print(f"Average time of members on {days[i]}: {ride_length_by_day_members[i]}")
# Average time of members on sunday: 718.0785057330068
# Average time of members on monday: 703.4451079163837
# Average time of members on tuesday: 708.4531940595466
# Average time of members on wednesday: 719.0868667739798
# Average time of members on thursday: 731.1160278698136
# Average time of members on friday: 836.8331027650607
# Average time of members on saturday: 828.0943054331574

print("------------------------------------------------------")

# Average ride_length by day of the week but only for casuals
ride_length_by_day_casuals = []
for i in range(len(days)):
  ride_length_by_day_casuals.append(np.average(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='casual') ]['ride_length']))
  print(f"Average time of casuals on {days[i]}: {ride_length_by_day_casuals[i]}")
# Average time of casuals on sunday: 1478.9263399657832
# Average time of casuals on monday: 1277.5599405696905
# Average time of casuals on tuesday: 1228.4515566492441
# Average time of casuals on wednesday: 1267.0297971979421
# Average time of casuals on thursday: 1333.372923207027
# Average time of casuals on friday: 1597.9670304313597
# Average time of casuals on saturday: 1627.782605696265

print("------------------------------------------------------")

# Number of rides by day_of_week
rides_by_day = []

for i in range(len(days)):
  rides_by_day.append(len(data_clean[data_clean['day_of_week'] == i + 1 ]))
  print(f"Numbers of rides on {days[i]}: {rides_by_day[i]}")
# Numbers of rides on sunday: 595954
# Numbers of rides on monday: 623930
# Numbers of rides on tuesday: 628627
# Numbers of rides on wednesday: 654341
# Numbers of rides on thursday: 617392
# Numbers of rides on friday: 709556
# Numbers of rides on saturday: 607383


print("------------------------------------------------------")

# Number of rides by day_of_week but only members

rides_by_day_members = []

for i in range(len(days)):
  rides_by_day_members.append(len(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='member') ]))
  print(f"Numbers of rides of members on {days[i]}: {rides_by_day_members[i]}")
# Numbers of rides on sunday: 382609
# Numbers of rides on monday: 424032
# Numbers of rides on tuesday: 422190
# Numbers of rides on wednesday: 422440
# Numbers of rides on thursday: 366705
# Numbers of rides on friday: 341186
# Numbers of rides on saturday: 302973

print("------------------------------------------------------")

# Number of rides by day_of_week but only casuals
rides_by_day_casuals = []
for i in range(len(days)):
  rides_by_day_casuals.append(len(data_clean[(data_clean['day_of_week'] == i + 1) & (data_clean['member_casual']=='casual') ]))
  print(f"Numbers of rides of casuals on {days[i]}: {rides_by_day_casuals[i]}")
# Numbers of rides on sunday: 213345
# Numbers of rides on monday: 199898
# Numbers of rides on tuesday: 206437
# Numbers of rides on wednesday: 231901
# Numbers of rides on thursday: 250687
# Numbers of rides on friday: 368370
# Numbers of rides on saturday: 304410
Mean ride length 1018.0223639638032 seconds
------------------------------------------------------
Max ride length 2061244.0 seconds
------------------------------------------------------
The most used day was: friday
------------------------------------------------------
Mean ride length of members: 743.9176837388036 seconds
------------------------------------------------------
Mean ride length of casuals: 1429.1119023260217 seconds
------------------------------------------------------
Most used day of week by members: monday
------------------------------------------------------
Most used day of week by casuals: friday
------------------------------------------------------
Average time on sunday: 990.4536910566923
Average time on monday: 887.3830605997467
Average time on tuesday: 879.2172592014024
Average time on wednesday: 913.2799763426103
Average time on thursday: 975.6575417238967
Average time on friday: 1231.9800748637176
Average time on saturday: 1228.884441941905
------------------------------------------------------
Average time of members on sunday: 718.0785057330068
Average time of members on monday: 703.4451079163837
Average time of members on tuesday: 708.4531940595466
Average time of members on wednesday: 719.0868667739798
Average time of members on thursday: 731.1160278698136
Average time of members on friday: 836.8331027650607
Average time of members on saturday: 828.0943054331574
------------------------------------------------------
Average time of casuals on sunday: 1478.9263399657832
Average time of casuals on monday: 1277.5599405696905
Average time of casuals on tuesday: 1228.4515566492441
Average time of casuals on wednesday: 1267.0297971979421
Average time of casuals on thursday: 1333.372923207027
Average time of casuals on friday: 1597.9670304313597
Average time of casuals on saturday: 1627.782605696265
------------------------------------------------------
Numbers of rides on sunday: 595954
Numbers of rides on monday: 623930
Numbers of rides on tuesday: 628627
Numbers of rides on wednesday: 654341
Numbers of rides on thursday: 617392
Numbers of rides on friday: 709556
Numbers of rides on saturday: 607383
------------------------------------------------------
Numbers of rides of members on sunday: 382609
Numbers of rides of members on monday: 424032
Numbers of rides of members on tuesday: 422190
Numbers of rides of members on wednesday: 422440
Numbers of rides of members on thursday: 366705
Numbers of rides of members on friday: 341186
Numbers of rides of members on saturday: 302973
------------------------------------------------------
Numbers of rides of casuals on sunday: 213345
Numbers of rides of casuals on monday: 199898
Numbers of rides of casuals on tuesday: 206437
Numbers of rides of casuals on wednesday: 231901
Numbers of rides of casuals on thursday: 250687
Numbers of rides of casuals on friday: 368370
Numbers of rides of casuals on saturday: 304410

Members have the higher amount of rides of the past year (60% of them), with a mean time of almost 12 minutes. Casual riders other way, have a mean time of almost 22 minutes, but why?

When we see the usage of the service by the day of week and by type of user, wee see that members use most in the midweek, while casual riders use in the weekend. So, with a lower time, and a midweek usage, we can assume that members use the service to travel between locations (maybe to work), while casual riders use it as a leisure time.

Share phase¶

As we could see from the analysis calculations, Members tends to use Cyclistic bikes to help them to move in the midle of the city midweek, and casual riders use the service as a leisure time in the weekend. Each side sees the service in a different way.

I think that some graphics will help to visualize that difference.

Charts¶

Those that I do not see much important I hide.

In [20]:
# Pie Chart numbers of rides 

n_members = len(data_clean[data_clean['member_casual']=='member'])
n_casuals = len(data_clean[data_clean['member_casual']=='casual'])

data = {'labels': ['members', 'casuals'],
        'number of rides': [n_members, n_casuals]}

fig = px.pie(data, values='number of rides', names='labels', title='Number of rides last year')
fig.show()
In [21]:
# Bar chart: Average timeby type of user

data = {'labels': ['members', 'casuals'],
        'Mean time': [mean_ride_length_members, mean_ride_length_casuals]}

fig = px.bar(data, x='labels', y='Mean time', title = 'Average time by type of user')

fig.update_layout(xaxis=dict(
                    title = 'Type of Users',
                    ), 
                  yaxis=dict(
                    title='Mean time (seconds)',
                    side='left'                 
                    )
)

fig.show()
In [22]:
# Bar chart: Rides By day of week


data = {'Days': days,
        'Rides by day': rides_by_day}

fig = px.bar(data, x='Days', y='Rides by day', title = 'Rides By day of week')
fig.show()
In [23]:
# Bar chart: Rides By day of week (only members)


data = {'Days': days,
        'Rides by day': rides_by_day_members}

fig = px.bar(data, x='Days', y='Rides by day', title = 'Rides By day of week (only members)')
fig.show()
In [24]:
# Bar chart: Rides By day of week (only casuals)


data = {'Days': days,
        'Rides by day': rides_by_day_casuals}

fig = px.bar(data, x='Days', y='Rides by day', title = 'Rides By day of week (only casuals)')
fig.show()
In [25]:
# Bar chart: Mean time by day of week 


data = {'Days': days,
        'Mean time by day': ride_length_by_day}

fig = px.bar(data, x='Days', y='Mean time by day', title = 'Mean time by day of week')
fig.show()
In [26]:
# Bar chart: Mean time by day of week (only members)


data = {'Days': days,
        'Mean time by day': ride_length_by_day_members}

fig = px.bar(data, x='Days', y='Mean time by day', title = 'Mean time by day of week (only members)')
fig.show()
In [27]:
# Bar chart: Mean time by day of week (only casuals)


data = {'Days': days,
        'Mean time by day': ride_length_by_day_casuals}

fig = px.bar(data, x='Days', y='Mean time by day', title = 'Mean time by day of week (only casuals)')
fig.show()

Act phase¶

Now, after analyzing the data and extracted a story from it, what could we do to increase anual memberships?

I've though in some ideas while analyzing and visualizing the data:

  1. Marketing campaing promoting that the service may help in the midweek travel in the city;

  2. Develop an app with circuits that can be completed by week (this may attract casual riders to sign the membership);

  3. Marketing campaigns to make more activities (may attract more casual riders and improve members usage in the weekend)